Using the Bootstrap to Estimate the Variance from a Single Systematic PPS Sample

نویسنده

  • Steven Kaufman
چکیده

1.0 Introduction Systematic sampling (either with equal or unequal selection probabilities) is a common sampling scheme in complex sample designs. It is used because of its simplicity of implementation and its potential increase in efficiency, given a good frame ordering, which acts as an additional stratification. One problem with systematic sampling is that such samples can be viewed as a cluster sample of cluster sample size one. As such, unbiased variance estimation becomes impossible without additional assumptions. One common method for approximating the variance from systematic sampling is to treat the sample as a super-stratified sample. This is accomplished by placing the sample selected within a stratum into the order it was selected and pairing consecutively selected PSUs. Each pair can then be treated as a pseudo-stratum for variance estimation purposes. There are problems using the pseudo-stratum variance approach. The main problem is that the pseudo-stratum variances still does not reflect the appropriate systematic sampling variance. As such, the variance may only reflect with-replacement sampling. By assumption, the correlation between pseudo-strata is assumed to be zero. At first glance, it seems like these drawbacks would lead to an overestimate of the variance. However, since the correlations can be negative, this need not be the case. In Kaufman (1998), it is shown that using the pseudo-stratum approach can produce large underestimates of the variance. To reduce this problem, the 1998 paper proposes a consistent bootstrap variance estimation procedure. The advantage of the bootstrap methodology is that it becomes possible to reflect an appropriate systematic sampling variance. The problem with this procedure is that without special adjustments, the bootstrap estimator is biased. To produce an unbiased variance estimator, adjustments are based on estimates from multiple samples. Generally, this is only possible with variables on the frame. Since the required adjustment is dependent on the variable of interest, the proposed procedure can have limited utility. In this paper, the frame will be randomized in a controlled way, so that some of the affects on efficiency of the frame ordering are maintained, while eliminating the within and between pseudo-stratum correlations. Without the correlations, it becomes possible to estimate the variance in an unbiased fashion, where the expectation is taken across all possible random orderings. With an unbiased variance estimator, the bootstrap variance estimator can be adjusted using only data from a single sample. The organization of this paper is: 1) define the randomized systematic sampling, 2) define the bootstrap procedure, 3) describe a simulation study to test the bootstrap variance estimator, and 4) present the results and conclusions. 2.0 Systematic Sampling Systematic probability proportionate to size sampling (PPS) is a common procedure used with complex sample designs. The procedure is described in (Wolter, 1985, pp. 283-286). The idea is to divide the frame into consecutive, exhaustive and disjoint groups of Primary Sampling Units (PSUs), called partition groups, such that the total measures of size in each group are all equal. The total measure of size in a group is called the sampling interval. For this to work, some PSUs must span multiple partition groups. The first sampled PSU is randomly selected from PSUs in the first partition group. All other PSUs are selected systematically, one per partition group, starting from the point of selection of the first PSU. It is assumed that before sample selection, PSUs with measures of size larger than the sampling interval have been excluded from the sampling. Such units are considered certainty PSUs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimum Block Size in Separate Block Bootstrap to Estimate the Variance of Sample Mean for Lattice Data

The statistical analysis of spatial data is usually done under Gaussian assumption for the underlying random field model. When this assumption is not satisfied, block bootstrap methods can be used to analyze spatial data. One of the crucial problems in this setting is specifying the block sizes. In this paper, we present asymptotic optimal block size for separate block bootstrap to estimate the...

متن کامل

Using the Bootstrap in a Two-Stage Nested Complex Sample Design

1.0 Introduction Replication variance estimation for a two-stage nested sample design is usually implemented by generating replicate samples (weights) that replicate the original first-stage sample selection. Since the second-stage is nested, the second-stage variance can be reflected by associating each second-stage unit with its respective first-stage unit in each firststage replicate sample....

متن کامل

Estimation of Variance of Normal Distribution using Ranked Set Sampling

Introduction     In some biological, environmental or ecological studies, there are situations in which obtaining exact measurements of sample units are much harder than ranking them in a set of small size without referring to their precise values. In these situations, ranked set sampling (RSS), proposed by McIntyre (1952), can be regarded as an alternative to the usual simple random sampling ...

متن کامل

Statistical Topology Using the Nonparametric Density Estimation and Bootstrap Algorithm

This paper presents approximate confidence intervals for each function of parameters in a Banach space based on a bootstrap algorithm. We apply kernel density approach to estimate the persistence landscape. In addition, we evaluate the quality distribution function estimator of random variables using integrated mean square error (IMSE). The results of simulation studies show a significant impro...

متن کامل

Variance Estimation of Indicators on Social Exclusion and Poverty using the R Package laeken

This vignette illustrates the application of variance estimation procedures to indicators on social exclusion and poverty using the R package laeken. To be more precise, it describes a general framework for estimating variance and confidence intervals of indicators under complex sampling designs. Currently, the package is focused on bootstrap approaches. While the naive bootstrap does not modif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002